SWOT Simulated North American Continent Hydrology Dataset Exploration in the Cloud

Accessing and Visualizing SWOT Simulated Datasets

Learning Objectives:

  • Access all 5 products of SWOT HR sample data (archived in NASA Earthdata Cloud) in the AWS cloud, without downloading to local machine
  • Visualize accessed data

SWOT Simulated Level 2 North America Continent KaRIn High Rate Version 1 Datasets:

  1. River Vector Shapefile - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1

DOI: https://doi.org/10.5067/KARIN-2RSP1

  1. Lake Vector Shapefile - SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1

DOI: https://doi.org/10.5067/KARIN-2LSP1

  1. Water Mask Pixel Cloud NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1

DOI: https://doi.org/10.5067/KARIN-2PIX1

  1. Water Mask Pixel Cloud Vector Attribute NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1

DOI: https://doi.org/10.5067/KARIN-2PXV1

  1. Raster NetCDF - SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1

DOI: https://doi.org/10.5067/KARIN-2RAS1

Notebook Author: Cassie Nickles, NASA PO.DAAC (Aug 2022)

Libraries Needed

import glob
import os
import requests
import s3fs
import netCDF4 as nc
import h5netcdf
import xarray as xr
import pandas as pd
import geopandas as gpd
import numpy as np
import matplotlib.pyplot as plt
import hvplot.xarray
import shapefile as shp
import zipfile

Get Temporary AWS Credentials for Access

S3 is an ‘object store’ hosted in AWS for cloud processing. Direct S3 access is achieved by passing NASA supplied temporary credentials to AWS so we can interact with S3 objects from applicable Earthdata Cloud buckets. Note, these temporary credentials are valid for only 1 hour. A netrc file is required to aquire these credentials. Use the NASA Earthdata Authentication to create a netrc file in your home directory. (Note: A NASA Earthdata Login is required to access data from the NASA Earthdata system. Please visit https://urs.earthdata.nasa.gov to register and manage your Earthdata Login account. This account is free to create and only takes a moment to set up.)

The following crediential is for PODAAC, but other credentials are needed to access data from other NASA DAACs.

s3_cred_endpoint = 'https://archive.podaac.earthdata.nasa.gov/s3credentials'

Create a function to make a request to an endpoint for temporary credentials.

def get_temp_creds():
    temp_creds_url = s3_cred_endpoint
    return requests.get(temp_creds_url).json()
temp_creds_req = get_temp_creds()
#temp_creds_req                      # !!! BEWARE, removing the # on this line will print your temporary S3 credentials.

Set up an s3fs session for Direct Access

s3fs sessions are used for authenticated access to s3 bucket and allows for typical file-system style operations. Below we create session by passing in the temporary credentials we recieved from our temporary credentials endpoint.

fs_s3 = s3fs.S3FileSystem(anon=False, 
                          key=temp_creds_req['accessKeyId'], 
                          secret=temp_creds_req['secretAccessKey'], 
                          token=temp_creds_req['sessionToken'],
                          client_kwargs={'region_name':'us-west-2'})

Single File Access

The s3 access link can be found using Earthdata Search (see tutorial) for a single file is as follows:

1. River Vector Shapefiles

s3_SWOT_HR_url1 = 's3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RIVERSP_V1/SWOT_L2_HR_RiverSP_Reach_007_522_NA_20220822T192441_20220822T193037_PGA0_01.zip'
s3_file_obj1 = fs_s3.open(s3_SWOT_HR_url1, mode='rb')

The native format for this sample data is a .zip file, and we want the .shp file within the .zip file, so we need to download the contents of the zip file into the cloud environment. I created a folder called SWOT_HR_shp to write to. Change the path to where you would like your extracted files to be written.

with zipfile.ZipFile(s3_file_obj1, 'r') as zip_ref:
    zip_ref.extractall('SWOT_HR_shp')

Next, we’ll look at the attribute table of the .shp file we just extracted to the ‘SWOT_HR_shp’ folder.

SWOT_HR_shp1 = gpd.read_file('SWOT_HR_shp/SWOT_L2_HR_RiverSP_Node_007_022_NA_20220804T224145_20220804T224402_PGA0_01.shp') 
SWOT_HR_shp1
reach_id node_id time time_tai time_str lat lon lat_u lon_u river_name ... p_wse p_wse_var p_width p_wid_var p_dist_out p_length p_dam_id p_n_ch_max p_n_ch_mod geometry
0 74292500281 74292500280011 -1.000000e+12 -1.000000e+12 no_data -1.000000e+12 -1.000000e+12 -1.000000e+12 -1.000000e+12 no_data ... 497.100006 0.0 42.0 5.000000 3172049.666 185.281576 0 1 1 POINT (-98.32889 40.05984)
1 74292500281 74292500280021 -1.000000e+12 -1.000000e+12 no_data -1.000000e+12 -1.000000e+12 -1.000000e+12 -1.000000e+12 no_data ... 497.100006 0.0 42.0 39.673469 3172264.948 215.281576 0 1 1 POINT (-98.32932 40.06160)
2 74292500281 74292500280031 -1.000000e+12 -1.000000e+12 no_data -1.000000e+12 -1.000000e+12 -1.000000e+12 -1.000000e+12 no_data ... 497.100006 0.0 42.0 23.040000 3172446.742 181.794637 0 1 1 POINT (-98.33016 40.06319)
3 74292500281 74292500280041 -1.000000e+12 -1.000000e+12 no_data -1.000000e+12 -1.000000e+12 -1.000000e+12 -1.000000e+12 no_data ... 497.100006 0.0 42.0 7.346939 3172667.769 221.026330 0 1 1 POINT (-98.33220 40.06402)
4 74292500281 74292500280051 -1.000000e+12 -1.000000e+12 no_data -1.000000e+12 -1.000000e+12 -1.000000e+12 -1.000000e+12 no_data ... 497.100006 0.0 36.0 17.000000 3172852.456 184.687478 0 1 1 POINT (-98.33448 40.06453)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
10661 74297800043 74297800040423 7.129681e+08 7.129681e+08 2022-08-04T22:4146Z 4.754130e+01 -1.025881e+02 3.624076e-05 1.838268e-04 no_data ... 560.100037 0.0 996.0 128.560000 4377666.061 175.686989 0 1 1 POINT (-102.58881 47.54064)
10662 74297800043 74297800040433 7.129681e+08 7.129681e+08 2022-08-04T22:4146Z 4.754127e+01 -1.025911e+02 2.999256e-05 1.520951e-04 no_data ... 560.100037 0.0 987.0 69.555556 4377867.951 201.890251 0 1 1 POINT (-102.59097 47.54146)
10663 74297800043 74297800040443 7.129681e+08 7.129681e+08 2022-08-04T22:4146Z 4.754171e+01 -1.025939e+02 3.369788e-04 1.708933e-03 no_data ... 560.100037 0.0 1027.5 271.555556 4378082.057 214.105755 0 1 1 POINT (-102.59332 47.54237)
10664 74297800043 74297800040453 7.129681e+08 7.129681e+08 2022-08-04T22:4146Z 4.754353e+01 -1.025954e+02 8.255214e-03 4.184626e-02 no_data ... 560.100037 0.0 1077.0 592.560000 4378263.852 181.794740 0 1 1 POINT (-102.59546 47.54344)
10665 74297800043 74297800040463 7.129681e+08 7.129681e+08 2022-08-04T22:4146Z 4.754496e+01 -1.025973e+02 6.227163e-05 3.157206e-04 no_data ... 560.100037 0.0 1135.5 1022.583333 4378438.780 174.928676 0 1 1 POINT (-102.59761 47.54443)

10666 rows × 53 columns

fig, ax = plt.subplots(figsize=(11,7))
SWOT_HR_shp1.plot(ax=ax, color='black')
<AxesSubplot:>

2. Lake Vector Shapefiles

The lake vector shapefiles can be accessed in the same way as the river shapefiles above.

s3_SWOT_HR_url2 = 's3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_LAKESP_V1/SWOT_L2_HR_LakeSP_Obs_007_522_NA_20220822T192415_20220822T193051_Dx0000_01.zip'
s3_file_obj2 = fs_s3.open(s3_SWOT_HR_url2, mode='rb')
with zipfile.ZipFile(s3_file_obj2, 'r') as zip_ref:
    zip_ref.extractall('SWOT_HR_shp')
SWOT_HR_shp2 = gpd.read_file('SWOT_HR_shp/SWOT_L2_HR_LakeSP_Obs_007_522_NA_20220822T192415_20220822T193051_Dx0000_01.shp') 
SWOT_HR_shp2
obs_id lake_id overlap time time_tai time_str wse wse_u wse_r_u wse_std ... iono_c xovr_cal_c p_name p_grand_id p_max_wse p_max_area p_ref_date p_ref_ds p_storage geometry
0 742081R000002 7420470702 93 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.934 0.051 0.051 0.159 ... 0.0 0.0 no_data -99999999 -1.000000e+12 1.35 -9999 -9999.0 -1.000000e+12 POLYGON ((-92.75926 42.04142, -92.75977 42.041...
1 742081R000003 7420472462 75 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 37.037 0.080 0.080 0.143 ... 0.0 0.0 no_data -99999999 -1.000000e+12 1.62 -9999 -9999.0 -1.000000e+12 POLYGON ((-92.91651 42.01167, -92.91681 42.011...
2 742081R000008 7420473212 58 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.578 0.181 0.181 0.058 ... 0.0 0.0 HENDRICKSON MARSH LAKE -99999999 -1.000000e+12 45.94 -9999 -9999.0 -1.000000e+12 POLYGON ((-93.24060 41.93319, -93.24066 41.933...
3 742081R000009 7420470712 73 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.910 0.110 0.110 0.136 ... 0.0 0.0 no_data -99999999 -1.000000e+12 4.50 -9999 -9999.0 -1.000000e+12 POLYGON ((-92.72557 42.03424, -92.72560 42.034...
4 742081R000011 7420470582 76 7.145116e+08 7.145116e+08 2022-08-22T19:26:51 36.904 0.109 0.109 0.628 ... 0.0 0.0 no_data -99999999 -1.000000e+12 1.89 -9999 -9999.0 -1.000000e+12 POLYGON ((-93.39929 41.90871, -93.39945 41.908...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
17240 742070R000729 7420857422 92 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 33.186 0.098 0.098 0.056 ... 0.0 0.0 no_data -99999999 -1.000000e+12 14.94 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.07581 47.57853, -95.07587 47.578...
17241 742070R000730 7420848152 96 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 33.071 0.038 0.038 0.174 ... 0.0 0.0 no_data -99999999 -1.000000e+12 3.06 -9999 -9999.0 -1.000000e+12 POLYGON ((-94.88532 47.61669, -94.88577 47.616...
17242 712070R000731 7120812272 74 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 33.104 0.077 0.077 0.085 ... 0.0 0.0 no_data -99999999 -1.000000e+12 4.41 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.08313 47.57179, -95.08341 47.571...
17243 712070R000732 7120816202 67 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 32.713 0.106 0.106 0.198 ... 0.0 0.0 MUD LAKE -99999999 -1.000000e+12 6.30 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.39968 47.51004, -95.39975 47.510...
17244 742070R000733 7420857422 95 7.145115e+08 7.145115e+08 2022-08-22T19:25:10 32.725 0.093 0.093 0.119 ... 0.0 0.0 no_data -99999999 -1.000000e+12 14.94 -9999 -9999.0 -1.000000e+12 POLYGON ((-95.07228 47.57473, -95.07257 47.574...

17245 rows × 43 columns

fig, ax = plt.subplots(figsize=(7,12))
SWOT_HR_shp2.plot(ax=ax, color='black')
<AxesSubplot:>

3. Water Mask Pixel Cloud NetCDF

Accessing the remaining files is different than the shp files above. We do not need to unzip the files because they are stored in native netCDF files in the cloud. For the rest of the products, we will open via xarray.

s3_SWOT_HR_url3 = 's3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXC_V1/SWOT_L2_HR_PIXC_007_522_094R_20220822T192900_20220822T192911_Dx0000_01.nc'
s3_file_obj3 = fs_s3.open(s3_SWOT_HR_url3, mode='rb')

The pixel cloud netCDF files are formatted with three groups titled, “pixel cloud”, “tvp”, or “noise” (more detail here). In order to access the coordinates and variables within the file, a group must be specified when calling xarray open_dataset.

ds_PIXC = xr.open_dataset(s3_file_obj3, group = 'pixel_cloud', engine='h5netcdf')
ds_PIXC
<xarray.Dataset>
Dimensions:                                (points: 769290, complex_depth: 2)
Coordinates:
    latitude                               (points) float64 ...
    longitude                              (points) float64 ...
Dimensions without coordinates: points, complex_depth
Data variables: (12/49)
    azimuth_index                          (points) float64 ...
    range_index                            (points) float64 ...
    interferogram                          (points, complex_depth) float32 ...
    power_plus_y                           (points) float32 ...
    power_minus_y                          (points) float32 ...
    coherent_power                         (points) float32 ...
    ...                                     ...
    solid_earth_tide                       (points) float32 ...
    load_tide_fes                          (points) float32 ...
    load_tide_got                          (points) float32 ...
    pole_tide                              (points) float32 ...
    ancillary_surface_classification_flag  (points) float32 ...
    pixc_qual                              (points) float32 ...
Attributes:
    description:                 cloud of geolocated interferogram pixels
    interferogram_size_azimuth:  2924
    interferogram_size_range:    4575
    looks_to_efflooks:           1.75
plt.scatter(x=ds_PIXC.longitude, y=ds_PIXC.latitude, c=ds_PIXC.height)
plt.colorbar().set_label('Height (m)')

4. Water Mask Pixel Cloud Vector Attribute NetCDF

s3_SWOT_HR_url4 = 's3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_PIXCVEC_V1/SWOT_L2_HR_PIXCVec_007_522_094R_20220822T192900_20220822T192911_Dx0000_01.nc'
s3_file_obj4 = fs_s3.open(s3_SWOT_HR_url4, mode='rb')
ds_PIXCVEC = xr.open_dataset(s3_file_obj4, decode_cf=False,  engine='h5netcdf')
ds_PIXCVEC
<xarray.Dataset>
Dimensions:               (points: 769290, nchar_reach_id: 11,
                           nchar_node_id: 14, nchar_lake_id: 10,
                           nchar_obs_id: 13)
Dimensions without coordinates: points, nchar_reach_id, nchar_node_id,
                                nchar_lake_id, nchar_obs_id
Data variables:
    azimuth_index         (points) int32 ...
    range_index           (points) int32 ...
    latitude_vectorproc   (points) float64 ...
    longitude_vectorproc  (points) float64 ...
    height_vectorproc     (points) float32 ...
    reach_id              (points, nchar_reach_id) |S1 ...
    node_id               (points, nchar_node_id) |S1 ...
    lake_id               (points, nchar_lake_id) |S1 ...
    obs_id                (points, nchar_obs_id) |S1 ...
    ice_clim_f            (points) int8 ...
    ice_dyn_f             (points) int8 ...
Attributes: (12/36)
    Conventions:                                 CF-1.7
    title:                                       Level 2 KaRIn high rate pixe...
    institution:                                 CNES
    source:                                      Simulation
    history:                                     2021-04-14 17:35:28Z: Creation
    platform:                                    SWOT
    ...                                          ...
    xref_input_l2_hr_pixc_vec_river_file:        /work/ALT/swot/swotdev/desro...
    xref_static_river_db_file:                   
    xref_static_lake_db_file:                    /work/ALT/swot/swotpub/BD/BD...
    xref_l2_hr_lake_tile_config_parameter_file:  /work/ALT/swot/swotdev/desro...
    ellipsoid_semi_major_axis:                   6371008.771416667
    ellipsoid_flattening:                        0.0
pixcvec_htvals = ds_PIXCVEC.height_vectorproc
pixcvec_latvals = ds_PIXCVEC.latitude_vectorproc
pixcvec_lonvals = ds_PIXCVEC.longitude_vectorproc

#Before plotting, we set all fill values to nan so that the graph shows up better spatially
pixcvec_htvals[pixcvec_htvals > 15000] = np.nan
pixcvec_latvals[pixcvec_latvals > 80] = np.nan
pixcvec_lonvals[pixcvec_lonvals > 180] = np.nan
plt.scatter(x=pixcvec_lonvals, y=pixcvec_latvals, c=pixcvec_htvals)
plt.colorbar().set_label('Height (m)')

5. Raster NetCDF

s3_SWOT_HR_url5 = 's3://podaac-ops-cumulus-protected/SWOT_SIMULATED_NA_CONTINENT_L2_HR_RASTER_V1/SWOT_L2_HR_Raster_100m_UTM15S_N_x_x_x_007_522_047F_20220822T192850_20220822T192911_Dx0000_01.nc'
s3_file_obj5 = fs_s3.open(s3_SWOT_HR_url5, mode='rb')
ds_raster = xr.open_dataset(s3_file_obj5, engine='h5netcdf')
ds_raster
<xarray.Dataset>
Dimensions:                (x: 1543, y: 1540)
Coordinates:
  * x                      (x) float64 6.567e+05 6.568e+05 ... 8.109e+05
  * y                      (y) float64 3.775e+06 3.775e+06 ... 3.929e+06
Data variables: (12/30)
    crs                    object b'1'
    longitude              (y, x) float64 ...
    latitude               (y, x) float64 ...
    wse                    (y, x) float32 ...
    wse_uncert             (y, x) float32 ...
    water_area             (y, x) float32 ...
    ...                     ...
    load_tide_fes          (y, x) float32 ...
    load_tide_got          (y, x) float32 ...
    pole_tide              (y, x) float32 ...
    model_dry_tropo_cor    (y, x) float32 ...
    model_wet_tropo_cor    (y, x) float32 ...
    iono_cor_gim_ka        (y, x) float32 ...
Attributes: (12/45)
    Conventions:                     CF-1.7
    title:                           Level 2 KaRIn High Rate Raster Data Product
    institution:                     JPL
    source:                          Large scale simulator
    history:                         2021-09-08T22:28:33Z : Creation
    mission_name:                    SWOT
    ...                              ...
    utm_zone_num:                    15
    mgrs_latitude_band:              S
    x_min:                           656700.0
    x_max:                           810900.0
    y_min:                           3775000.0
    y_max:                           3928900.0

It’s easy to analyze and plot the data with packages such as hvplot!

ds_raster.wse.hvplot.image(y='y', x='x')